optimization and generalization analysis
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions. By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the number of node aggregations under some conditions. Our results offer theoretical explanations for the effectiveness of the multi-scale structure against the over-smoothing problem. We apply boosting algorithms to the training of multi-scale GNNs for real-world node prediction tasks. We confirm that its performance is comparable to existing GNNs, and the practical behaviors are consistent with theoretical observations.
Review for NeurIPS paper: Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
Additional Feedback: I read the author feedback. It answers my question well and was consistent with what I assumed in the original review. Therefore, I remain my positive evaluation. In my understanding, in standard multi-scale GNN, there are nonlinear activation in-between aggregation functions G. In this paper, there is no nonlinear activation in-between aggregation functions G. Nonlinear activation is only in B. Therefore, "graph" part G is always linear. Is there such multi-scale GNN in the literature?
Review for NeurIPS paper: Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
The paper considers multi-scale GNNs which have been shown to address over-smoothing issues with standard GNNs, and establishes optimization and generalization guarantees from the perspective of gradient boosting. The paper also suggests GB-GNN with linear transformations, and illustrates that the model can be competitive with the state-of-the-art. Most reviewers felt that the work presents a unique perspective to the performance of multi-scale GNNs. There are some concerns regarding the work - the technical results follow from assumptions and existing results on transductive learning, so there is limited core technical novelty. The work analyzes linear transformations which is different from nonlinear transformations often used in practice.
Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks
It is known that the current graph neural networks (GNNs) are difficult to make themselves deep due to the problem known as over-smoothing. Multi-scale GNNs are a promising approach for mitigating the over-smoothing problem. However, there is little explanation of why it works empirically from the viewpoint of learning theory. In this study, we derive the optimization and generalization guarantees of transductive learning algorithms that include multi-scale GNNs. Using the boosting theory, we prove the convergence of the training error under weak learning-type conditions.